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Abstract — In this paper, we show that nested lattice codes 
achieve the capacity of arbitrary channels with or without 
non-casual state information at the transmitter. We also show 
that nested lattice codes are optimal for source coding with or 
without non-causal side information at the receiver for arbitrary 
continuous sources. 

Index Terms — linear codes, lattice codes, Gelfand-Pinsker 
problem, Wyner-Ziv problem 

I. Introduction 

LATTICE codes for continuous sources and channels are 
the analogue of linear codes for discrete sources and 
channels and play an important role in information theory and 
communications. Linear/lattice and nested linear/lattice codes 
have been used in many communication settings to improve 
upon the existing random coding bounds IH-ISl. 
In 13] and iH the existence of lattice codes satisfying Shan- 
non's bound has been shown. These results have been general- 
ized and the close relation between linear and lattice codes has 
been pointed out in fS). In several results regarding lattice 
quantization noise in high resolution has been derived and the 
problem of constructing lattices with an arbitrary quantization 
noise distribution has been studied in IfTOl . 

Nested lattice codes were introduced in ITTl where the 
concept of structured binning is presented. Nested linear/lattice 
code are important because in many communication problems, 
specially multi-terminal settings, such codes can be superior 
in average performance compared to random codes 0. It has 
been shown in [12| that nested lattice codes are optimal for 
the Wyner-Ziv problem when the source and side information 
are jointly Gaussian. The dual problem of channel coding with 
state information has been addressed in ifTSl and the optimality 
of lattice codes for Gaussian channels has been shown. In lfT4l 
it has been shown that nested linear codes are optimal for 
discrete channels with state information at the transmitter 

In this paper we focus on two problems: 1) The point to 
point channel coding with state information at the encoder (the 
Gelfand-Pinsker problem |15|) and 2) Lossy source coding 
with side information at the decoder (the Winer-Ziv problem 
lfT6l ITtI ). We consider these two problems in their most 
general settings i.e. when the source and the channel are 
arbitrary. We use nested lattice codes with joint typicality 
decoding rather than lattice decoding. We show that in both 
settings, from an information-theoretic point of view, nested 
lattice codes are optimal. 

This work was supported by NSF grants CCF-0915619 and CCF- 111 6021. 



The paper is organized as follows: in Section |ll] we present 
the required preliminaries and introduce our notation. In Sec- 
tion|III]we show the optimality of nested lattice codes for chan- 
nels with state information (the Gelfand-Pinsker problem). We 
show the optimality of nested lattice codes for source coding 
with side information (the Wyner-Ziv problem) in Section |IV] 
and we finally conclude in Section [Vl 

II. Preliminaries 

1) Channel Model: We consider continuous memoryless 
channels with knowledge of channel state information at 
the transmitter used without feedback. We associate two 
sets X and y with the channel as the channel input and 
output alphabets. The set of channel states is denoted by 
S and it is assumed that the channel state is distributed 
over S according to P5. When the state of the channel 
5 is s G iS, the input-output relation of the channel is 
characterized by a transition kernel WY\xs{y\^i s) for x & X 
and y E y. We assume that the state of the channel is known 
at the transmitter non-causally. The channel is specified by 
{X, y, iS, Ps, Wy\xs^ where w : X x S ^ R+ is the cost 
function. 

2) Source Model: The source is modeled as a discrete-time 
random process X with each sample taking values in a fixed 
set X called alphabet. Assume X is distributed jointly with 
a random variable S according to the measure Pxs over 
X X S where S is an arbitrary set. We assume that the 
side information S is known to the receiver non-causally. 
The reconstruction alphabet is denoted by U and the quality 
of reconstruction is measured by a single-letter distortion 
functions d : X x U ^ R"'". We denote such sources by 
{X,SM,Pxs,d). 

3) Linear and Coset Codes Over Zp.- For a prime number 
p, a linear code over Zp of length n and rate R = ^ logp is a 
collection of p'' codewords of length n which is closed under 
mod-p addition (and hence mod-p multiplication). In other 
words, linear codes over Zp are subspaces of Zp. Any such 
code can be characterized by its generator matrix G G Z^'^". 
This follow from the fact that any subgroup of an Abelian 
group corresponds to the image of a homomorphism into that 
group. The linear encoder maps a message tuple u G Zp to 
the codeword x where x = uG and the operations are done 
mod-p. The set of all message tuples for this code is Zp and 
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the set of all codewords is the range of the matrix G. i.e. 

C^{uG\ueZ^p} (1) 

A coset code over Xp is a shift of a linear code by a fixed 
vector A coset code of length n and rate R ~ —logp is 
characterized by its generator matrix G G Z^^" and it's 
shift vector (dither) B e Z^. The encoding rule for the 
corresponding coset code is given by x = uG + B, where 
u is the message tuple and x is the corresponding codeword, 
i.e. 



C = {uG + B\u e Z^} 



(2) 



In a similar manner, any linear code over Zp of length n 
and rate (at least) R ~ logp is characterized by its parity 



check matrix H E Z^ This follows from the fact that any 
subgroup of an Abelian group corresponds to the kernel of a 
homomorphism from that group. The set of all codewords of 
the code is the kernel of the matrix H; i.e. 



C = {?i e Z'^\Hu = 0} 



(3) 



where the operations are done mod-p. Note that there are at 
„7i-k codewords in this set. A coset code over Zj, is a 



least 

shift of a linear code by a fixed vector. A coset code of length 
n and rate (at least) R = ^^^^logp can be characterized by 
its parity check matrix H £ Z^^" and it's bias vector c G Z^' 
as foUows: 



C = {ue 'K^\Hu = c} 
where the operations are done mod-p. 



(4) 



4) Lattice Codes and Shifted Lattice Codes: A lattice code 
of length ri is a collection of codewords in IR" which is closed 
under real addition. A shifted lattice code is any translation 
of a lattice code by a real vector. In this paper, we use coset 
codes to construct (shifted) lattice codes as follows: Given a 
coset code C of length n over Zp and a step size 7, define 



A(C,7,P) = 7(C-^) 



(5) 



Then the corresponding mod-p lattice code A(C,7,p) is the 
disjoint union of shifts of A by vectors in 7pZ". i.e. 



A(C,7,p)= (J (7W + A) 



It can be shown that this definition is equivalent to: 

A(C,7,p) = |7(u- e Z",u modpeC 



Note that A(C,7,p) C A(C,7,p) is a scaled and shifted copy 
of the linear code C. 



5) Nested Linear Codes: A nested Unear code consists of 
two linear codes, with the property than one of the codes (the 
inner linear code) is a subset of the other code (the outer 
linear code). For positive integers k and I, let the outer and 
inner codes Ci and Co be linear codes over Zp characterized 
by their generator matrices G e Z^^" and G' € Z^''^'^''" 
and their shift vectors B e Z^ and B' G Z^ respectively. 
Furthermore, assume 



G' 



G 

AG 



B' = B 



For some AG G Z^ In this case, 

Co = {aG + mAG + B|a G Z^,7« G Z^} , (6) 
C, = {aG + S|aGZ^} (7) 

It is clear that the inner code is contained in the outer code. 
Furthermore, the inner code induces a partition of the outer 
code through its shifts. For m G Z^ define the mth bin of Ci 
in Co as 

B,„ = {aG + mAG + B\a G Z^} 

Similarly, Nested linear codes can be characterized by 
the parity check representation of linear codes. For positive 
integers k and I, let the outer and inner codes Co and Ci 
be linear codes over Zp characterized by their parity check 



y{k+l)xn 



and their bias vectors 



H 




c 


AH 


,c' = 


Ac 



matrices H el}^"^ and H' G Z^ 

c G Zp and c' G Zp+' respectively. Furthermore assume: 
H' 

For some AH G Zp^" and Ac G Zp. In this case, 

Co = {u G 'E';\Hu = c] , (8) 
Ci = {u G Z;'|i?u = c, AiJu = Ac} (9) 

For m G Zp define the mth bin of Ci in Co as 

B„ = {u G Z^'liJu = c, AHu = m] 

The outer code is the disjoint union of all the bins and each 
bin index m G Zp is considered as a message. We denote a 
nested linear code by a pair (C^, Co). 

6) Nested Lattice Codes: Given a nested linear code 
(Ci, Co) over Zp and a step size 7, define 



A,(C„7,P)=7(€.-^), 
Ao(Co,7,p)=7(Co-^) 



(10) 
(11) 



Then the corresponding nested lattice code consists of an inner 
lattice code and an outer lattice code 

A,(C,,7,p) = U^epZ"(7t' + AO (12) 
Ao(Co,7,p) = U„6pz.(7v + Ao) (13) 

In this case as well, the inner lattice code induces a partition 
of the outer lattice code. For to G Z^, define 



p-1. 



(14) 
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where B„i is the mth bin of in Co- The mth bin of the 
inner lattice code in the outer lattice code is defined by: 

The set of messages consists of the set of all bins of Aj in 
Aq. We denote a nested lattice code by a pair (A,, Aq). 

7) Achiev ability for Channel Coding and the Capacity- 
Cost Function: A transmission system with parameters 
(n, M, r, r) for reliable communication over a given channel 
{X, y, S, Ps,Wy\xs, w) with cost function w : X xS ^ R+ 
consists of an encoding mapping and a decoding mapping 

e : 5" X {1,2,...,A/} A"" 
f -.y" ^{l,2,...,M} 

such that for all m = 1, 2, . . . , M, if s = (si, • • ■ , s„) and 
X = e{s,'m) — {xi , • • ■ ,Xn), then 

1 " 

- w{xi,St) < r 

i—l 

and 

Given a channel (A:", ^, 5, Psi WV|xSj 'w), a pair of non 
negative numbers {R, W) is said to be achievable if for all 
e > and for all sufficiently large n, there exists a trans- 
mission system for reliable communication with parameters 
{n,M,r,T) such that 

- log M > i? - e, r <W + e, t < e 

n 

The optimal capacity cost function C{W) is given by the 
supremum of C such that (C, W) is achievable. 

8) Achievability for Source Coding and the Rate-Distortion 
Function: A transmission system with parameters (n, 8, A, r) 
for compressing a given source {X,S,U, Pxs,d) consists of 
an encoding mapping and a decoding mapping 

e: A'"^{1,2,..- ,9}, 

such that the following condition is met: 

P(d(X",.g(e(X")))>A)<T 

where X" is the random vector of length n generated by 
the source. In this transmission system, n denotes the block 
length, log denotes the number of channel uses, A denotes 
the distortion level and r denotes the probability of exceeding 
the distortion level A. 

Given a source, a pair of non-negative real numbers {R, D) 
is said to be achievable if there exists for every e > 0, and 
for all sufficiently large numbers n a transmission system with 
parameters (n, 0, A,t) for compressing the source such that 

-loge<i? + e, A<D + e, T<e 
n 



The optimal rate distortion function R*{D) of the source 
is given by the infimum of the rates R such that (i?, D) is 
achievable. 

9) Typicality: We use the notion of weak* typicality with 
Prokhorov metric introduced in fTSl . Let Af(lR'^) be the set 
of probability measures on K.''. For a subset A of define 
its e-neighborhood by 

A" = {x € M'^lBy G A such that ||a; - y\\ < e} 

where || • || denotes the Euclidean norm in Mf^. The Prokhorov 
distance between two probability measures Pi,P2 G M(R'^) 
is defined as follows: 

TTd{Pi:P2) =inf{e > 0\Pi{A) < P2{A') + e and 

P2iA) < Pi(A") + e V Borel set A in R"^} 

Consider two random variables X and Y with joint distribution 
Pxy{', •) over X x y 'Z R^. Let n be an integer and e be a 
positive real number For the sequence pair (x, y) belonging 
to A'" X y" where x = (xi, ■ • • , a;„) and y = (j/i, • • • , y„) 
define the empirical joint distribution by 

1 " 

Pxy{A, = - ^ l{x.eA,y,eB} 

1=1 

for Borel sets A and B. Let P^ and Py be the corresponding 
marginal probability measures. It is said that the sequence x 
is weakly* e-typical with respect to Px if 

We denote the set of all weakly* e-typical sequences of length 
n by A'^{X). Similarly, x and y are said to be jointly weakly* 
e-typical with respect to Pxy if 

T^2{Pxy,PxY) < e 

We denote the set of all weakly* e-typical sequence pairs of 
length n by A'^iXY). 

Given a sequence x £ A^, the set of conditionally e-typical 
sequences ^"(Fja;) is defined as 

A-^{Y\x) = {y£y-\{x,y)£A^l{X,Y)} 

10) Notation: In our notation, 0(e) is any function of e 
such that lim£^oO(e) = and for a set G, \G\ denotes the 
cardinality (size) of G. 



III. Channel Coding 

We show the achievability of the rate R = I{U]Y) — 
I{U\ S) for the Gelfand-Pinsker channel using nested lattice 
code for U . 

Theorem IILl. For the channel [X, y, S, Ps, Wy\xs, w), let 
w : X 1R+ be a continuous cost function. Let lA be an 
arbitrary set and let SUXY be distributed over S xU x X xy 
according to PsPu\sWx\usWy\SX where the conditional 
distribution Pij\s '^^d. the transition kernel Wx\us '^^^ such 
that Wi{w{X)} < W. Then the pair {R, W) is achievable us- 
ing nested lattice codes over U where R = I{U ; Y) — I{U; S). 
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A. Discrete U and Bounded Continuous Cost Function 

In this section we prove the theorem for the case when 
U ~ U takes values from the discrete set 7(Zp — 
where p is a prime and 7 is a positive number. We use a 
random coding argument over the ensemble of mod-p lattice 
codes to prove the achievability. Let Co and Ci be defined 
as ^ and ^ where G is a random matrix in Z^^", AG 
is a random matrix in Z^^" and i? is a random vector in 
Z^. Define A,(C,,7,p) and Ao(Co,7,p) accordingly. The 
ensemble of nested lattice codes consists of all lattices of the 
form ([Tol l and ( fTTT i. The set of messages consists of all bins 
*8„i indexed by m G Z^'. 

The encoder observes the massage to G Z^ and the channel 
state s G iS" and looks for a vector u in the mth bin *8m 
which is jointly weakly* typical with s and encodes the 
massage to to x according to Wx\su- Ths encoder declares 
error if it does not find such a vector 

After receiving y G y", the decoder decodes it to m G Z^' if 
to is the unique tuple such that the 7Tith bin *8„i contains a 
sequence jointly typical with y. Otherwise it declares error. 

1) Encoding Error: We begin with some definitions and 
lemmas. Let 



where (a) follows since the B is uniform over and 
independent of G and {b) follows since B and G are uniform 
and a — a ^ ■ 

Lemma III.3. For a, a G and m, rh (£ 'Ii^ if m rh then 
g{a,m) and g{d,m) are independent, i.e. For m G S" and 

u G S', 



P{g{a, m) = u,g{a, m) ^ u) ^ 



Proof: The proof is similar to the proof of the previous 
lemma and is omitted. ■ 
For a message m G Z^ and state s G 5", the encoder 
declares error if there is no sequence in Q3m jointly typical 
with s. Define 



m)(^A^{U\s)} 



a62 



5' = 



2 ' 2 



n7Z" (15) 



For a G Zp, m G Z^, define 

g{a, to) = 7 [{aG + toAG + B) - -^^^^ 
g(a, to) has the following properties: 

Lemma III.l. For a ^1}^ and m G Z^, g{a, m) is uniformly 
distributed over S'. i.e. For u G S', 

P{g{a,m) ^ u) = ^ 



Let Z he a uniform random variable over 7 (Zp - and 
hence Z" a uniform random variable over S'. Then we have 

we need the following lemmas from to proceed: 

Lemma III.4. Let Pxy be a joint distribution on and Px 
and Py denote its marginals. Let Z" be a random sequence 
drawn according to P^'. If D{Pxy\\PzPy) is finite then for 
each (5 > 0, there exist e((5) such that if e < e{S) and y G 
A'^iPy) then 

1 



Proof: Note that B is independent of G and AG and Umsup - logP^((Z", y) G A^(Pxy) < -i?(Pxy ||PzPy)+5 



therefore aG + mlS.G + B is a uniform variable over Z^. The 



lemma follows by noting that 



5' = 7(Z^;'-^ 



Lemma III.2. For a^a ^ 1}^ and m £ if a ^ a then 
g{a,m) and g{d,m) are independent, i.e. For u G S" and 

u G S', 

P{g{a, to) = u, g{d, ^) = u) = ^ 

Proof: It suffices to show that oG + toAG + P and dG + 
toAG + B are uniform over and independent. Note that 
for M, u G Z^, 

P (aG + toAG + B = u,hG + mAG + B = u) 

= P {aG + mAG + B ^u,{a- a)G = u - u) 

''^ P {aG + toAG + B ^u) X P{{&-a)G ^ii- u) 

(6) J_ 
p2n 



Proof: This lemma is a generalization of Theorem 21 of 
ifTSl . The proof is provided in the Appendix. ■ 

Lemma III.5. Let Pxy be a joint distribution on and Px 
and Py denote its marginals. Let Z" be a random sequence 
drawn according to P^. Then for each e,5 > 0, there exist 
e(e, 6) such that if y & A^{Py ) then 



liminf - logP|((Z", y)G A^(Pxi') > -D{Pxy\\PzPY)-5 
n 



Proof: This lemma is a generalization of Theorem 22 of 
ll 1 81 . The proof is provided in the Appendix. ■ 
Using these lemmas we get 

E{6'(s)} = pl2~'<'[D{P(js\\PzPs)+0{e)\ 

Similarly, let Z" = g{a, m) and Z" = g{a, to). Note that Z" 
and Z" are equal if a = a and are independent if a 5. We 
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have 



Therefore 



Hence, 



^ pl2-n[D{P(js\\PzPs)+0(e)] 

var{6i(s)} = E{6'(s)2} - ^{9{s)Y 

^ ^;2-"[^(^t/sll^z^s)+0(e)] 



P(6l(s) = 0) < P{\e{s) - E{6l(s)}| > E{6l(s)) 

('^) var{6>(s)} 
- E{0(s)}2 

<- ^l2-n[D{Pfjs\\PzPs]+0(e)] 

Where (a) follows from Chebyshev's inequality. This bound 
is valid for all s G S". Therefore if 



-\ogp> D{P^g\\PzPs) 



(16) 



then the probabiUty of encoding error goes to zero as the 
block length increases. 

2) Decoding Error: The decoder declares error if there is 
no bin *Bm containing a sequence jointly typical with y where 
y is the received channel output or if there are multiple bins 
containing sequences jointly typical with y. Assume that the 
message rn has been encoded to x according to Wx\su where 
u = g{a,'m) and the channel state is s. The channel output y 
is jointly typical with u with high probability. Given m, s, a 
and u, the probability of decoding error is upper bounded by 

Perr < E {sia,^) & A: {U\yMa, m) G A:{U\y) 

rh^m 

Where in (a) we use Lemmas IIII.3I IIII.4I and HlOl Hence the 
probability of decoding error goes to zero if 

k + l 



■\OgP<D{Pfjy\\PzPY) 



(17) 



3) The Achievable Rate: Using ( fTSI l and (fTTl l. we conclude 
that if we choose ^ logp sufficiently close to D{Pjjg\\PzPs) 
and -^^logp sufficiently close to D{Pjjg\\PzPs) we can 
achieve the rate 

i? = ^ \OgP « D{P^y\\PzPY) - D{Pfjg\\PzPs) 

^I{U-Y)-I{U-S) 



B. Arbitrary U and Bounded Continuous Cost Function 

Let Q = {Ai, A2, - ■ ■ ,Ar} he a finite measurable partition 
of M,'^. For random variables U and Y on Mf^ with measure 
PjjY define the quantized random variables Uq and Yq on Q 
with measure 

PuQYQ{A„Aj)^PuYiA,,Aj) 

The KuUback-Leibler divergence between U and Y is defined 
as 

D{U\\Y) = sup D{Uq\\Yq) 
Q 

where D{Uq\\Yq) is the discrete Kullback-Leibler divergence 
and the supremum is taken over all finite partitions Q of Mf^. 
Similarly, the mutual information between U and Y is defined 
as 

I{U;Y)^supI{Uq;Yq) 
Q 

where I{Uq;Yq) is the discrete mutual information between 
the two random variables and the supremum is taken over all 
finite partitions Q of R''. 

We have shown in Section IIII-AI that for discrete random 
variables the region given in Theorem IIII.ll is achievable. In 
this part, we make a quantization argument to generalize this 
result to arbitrary auxiliary random variables. Let S, U, X, Y 
be distributed according to PsPu\s^x\us^y\x where in 
this case U is an arbitrary random variable. We start with 
the following theorem: 

Theorem III.2. Let J-\ C T2 C • • ■ be an increasing 
sequence of a -algebras on a measurable set A. Let J- 00 denote 
the a-algebra generated by the union W^^^J-'n- Let P and Q 
be probability measures on A. Then 

^(^I^JIQI^J ^ D{P\:fJ\Q\tJ asn^^ 

where P\jr denotes the restriction of P on T. 

Proof: Provided in |fT9l and 1201 for example. ■ 
For a prime p > 2, a real positive number 7 and for i = 
• ■ • , p — 1 define 



-i{v- 1) 



71 



Define the quantization Q-^^p as Q^^p {Aq, ^2, • • ■ , ^p-i} 
where 

= (-00, ao] 

= (aj_i,aj], for i = 1, • • • ,p - 2 

Ap-\ = (ap_2, +00) 

Let the random variable U-^^p take values from {ao, • • • , ap_i} 
according to joint measure 

PsijxY^^ = e B) = Psuxy{U e A,, SXY e B) 

(18) 

For all Borel sets B C R-^. For a fixed 7, let p < g be two 
primes. Then the a-algebra induced by Q^,p is included in the 
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cr-algebra induced by Q^^q. Therefore, for a fixed 7, we can 
use the above theorem to get 

I{U\F-,..;y\F-,J ^ I{U\r,.^;y\T,.J asp ^00 (19) 



^ is a random variable over Q^^oo = G Z} 

^ + (71, 7(« + 1)] with measure -Pc/l^^ ^ (^i) = 



where J7|jr 
where Ai = 

PC/(A,;). 

Let 70 = 1 and define 7„ = ij^. Note that if 7ti > n then 
^in,oo is included in J-^„^,oo- Also, since dyadic intervals 
generate the Borel Sigma field ( II2TI for example), the 
restriction of U to the sigma algebra generated by U^j^ J%y„_oo 
is U itself. We can use Theorem 1111.21 to get 



J ^ I{U]Y) as n 



(20) 



Combining ( fT9] l and (|20] | we conclude that for all e > 0, there 
exist r and P such that if 7 < F and p>V then 

\l{U\:F,^,;Y\^,J~I{U;Y)\<e 

Since quantization reduces the mutual information {Xq 
X — > Y), we have 

I{U\^../,Y\^.J < mr^^^-.Y) < I{U;Y) 

Therefore \l{U\jr_^.^;Y) - I{U;Y)\ < e. Also note that 
I{U\jr ■ Y) = I{Uf.p; Y) since we define the joint measure 
to be the same. Therefore 



IiU^,p;Y) - I{U;Y) 



< e 



(21) 



With a similar argument, for all e > there exist 7 and p such 
that 



Ii%,p; S) ~ I{U; S) 



< € 



(22) 



if we take the maximum of the two p's and the minimum of 
the two 7's, we can say for all e > there exist 7 and p such 
that both (ISTT i and ( |22] | happen. 

consider the sequence Pg^ ^ as n,p — > cxd. In the next 
lemma we show that under certain conditions this sequence 
converges in the weak* sense to Psux- 



Lemma III.6. Consider the sequence P, 



SU-, 



where n 



00 and p is such that jnP — > 00 fli n — > 00 (Take p to be 
the smallest prime larger than 2^" for example.). Then the 
sequence converges to Psux in the weak* sense as n 00. 

Proof: It suffices to show that the three dimensional 



cumulative distribution function F, 



su. 



point-wise in all points {s,u,x) S E, where F is continuous. 
Let (s, u, x) be a point where F is continuous and for an 
arbitrary e > 0, let S be such that 

\Fsux{s,u ~ S,x) ~ Fsux{s,u,x)\ < e 
\Fsux{s,u + 5,x) - Fsux{s,u,x)\ < e 



oX 



converges to Fsux 



Let p be such that 7,1 = ^ < ^ and find p accordingly. 
Then there exist points , aj such that G [u — 6, u] and 



aj G [ii, M + 5]. We have 



Fsuxis,u- 6,x) < F. 



< F. 



SU-y 



{s,ai,x) 
{s,u,x) 

< Fsux{s,u + S,x) 



< F, 



uX 



uX 



uX 



Therefore F, 



S U ^„ , -n X 



{s,u,x) — Fsux{s,u,x) < e. This 
shows the point- wise convergence of F^jj ^. ■ 
The above lemma implies Ep . {wTX^S)} converges 
to Wjpg^,^{w{X, S)} < W since w is assumed to be bounded 
continuous. 

We have shown that for arbitrary Pu\s and Wx\sU' one can 
find Pfj^g and W^^g^ induced from dTsl) such that {/ is a 
discrete variable and 

I{U; Y) - I{U; S) « I{U; Y) - I{U; S) 
Ep^^^ {w{X, S)} « Ep,„, {wiX, S)} 

Hence, using the result of section Illl-AI we have shown the 
achievability of the rate region given in Theorem 1111.11 for 
arbitrary auxiliary random variables when the cost function 
is bounded and continuous. 

C. Arbitrary U and Continuous Cost Function 

For a positive number I, define the clipped random variable 
X by X = sign(X) min(Z, |X|) and let Y be distributed 
according to Wy^x{-,x) ~ VFy|x(-,i)- 

Lemma in.7. As I ^ 00, I{U; Y) ^ I{U; Y). 

Proof: Note that for Borel sets Bi , i?2 , S3 if S2 ^ (-^,0 

then 

Fuxvi-^T^' -^2, B3) = Puxy{Bi, B2, B3) 

For any e > 0, let Q = {^1, ■ • ■ , A^} be a quantization such 
that 

\I{UQ;YQ)-I{U;Y)\<e 

For an arbitrary S > 0, assume / is large enough such that 
Pxii-l,l)) >l-d. Then 

PuqYq {A^, Aj) = Puxy{A^, R, Aj) 

= Puxy{A^, {-I, l),AJ) + PuxY{A^, (-00, -l]U[l, 00), Aj) 
< Puxy{A^, {-I, l),Aj) + Puxy{R, {-<x, -I] U [1, 00), R) 
= Puxy{A, l),Aj) + Pxii-^, -I] U [1, 00)) 

<P^y{A„Aj)+S 



UqYq 



{Ai , Aj 



Also, 



Puq Yq{A^, Aj ) PuXY {A^ , R, Aj ) 

>Puxy{A,,{~1,1),Aj) 
■>P^^y{M,R,Aj)~5 

^P^y{A.,,Aj)-5 

= PuqYq (^i' ^j) ~ ^ 
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Since the choice of S is arbitrary and since the discrete 
mutual information is continuous, we conclude that as e, (5 —5- 
(hence / ^ oo), /([/; Y) I{U; Y). ■ 

Since X is bounded and w is assumed to be continuous, w 
is also bounded. This completes the proof. 

IV. Source Coding 

In this section, we show the achievability of the rate R = 
I{U ; X) — I{U; S) for the Wyner-Ziv problem using nested 
lattice codes for U. 

Theorem IV.l. For the source {X,S^hl,PxSid) assume d : 
X X lA ^ R"*" is continuous. Let U be a random variable 
taking values from the set lA jointly distributed with X and 
S according to PxsWjj\x where Wij\x{'\') o. transition 
kernel. Further assume that there exists a measurable function 
f :S xU X such that E{d(X, f(S, U))} < D. Then the 
rate R*{D) — I{X] U) — I{S; U) is achievable using nested 
lattice codes. 



A. Discrete U and Bounded Continuous Distortion Function 

In this section we prove the theorem for the case when 
U takes values from the discrete set 7(Zp — ^7^) where p 
is a prime and 7 is a positive number. The generalization 
to the case where U is arbitrary and the distortion function 
is continuous is similar to the channel coding problem and 
is omitted. We use a random coding argument over the 
ensemble of mod-p lattice codes to prove the achievability. 
The ensemble of codes used for source coding is based on 
the parity check matrix representation of linear and lattice 
codes. Define the inner and outer linear codes as in (O and 
(|9]l where _ff is a random matrix in Z^^", AH is a random 
matrix in Z^^", c is a random vector in Zp and Ac is a 



-'p 

c is a random vector in 
random vector in Z^. Define Ai(Ci,7,p) and Ao(Co,7,p) 
accordingly. The set of messages consists of all bins *8m 
indexed by m G Z^. 

For m G Z^', Let 05 ,„ be the mth bin of in Aq. The 
encoder observes the source sequence x G X^ and looks for 
a vector u in the outer code Ao which is typical with x and 
encodes the sequence x to the bin of A^ in Aq containing u. 
The encoder declares error if it does not find such a vector. 
Having observed the index of the bin m and the side 
information s, the decoder looks for a unique sequence u 
in the mth bin which is jointly typical with s and outputs 
/(w, s). Otherwise it declares error. 

1) Encoding Error: Define S' as in ( flSl l. For u G S' define 

g{u) has the following properties: 
Lemma IV.l. For u e S', 



P{u G Ao) = P{Hgiu) = c) = ^ 

pi 

i.e. All points of S' lie on the outer lattice equiprobably. 



Proof: Follows from the fact that c is independent of H 
and is uniformly distributed over Z^. ■ 

Lemma IV.2. For u G S" and u G S", if u ^ u, 

P{ueAo,ue Ao) = P {Hg{u) = c, Hg{u) = c) = 

i.e. All points of S' lie on the outer lattice independently. 
Proof: Note that 

P{Hg{u)^c,Hg{u) = c:) 

^P{Hg{u)^c,H{giu)-g{u)) = 0) 

P {Hg{u) ^c)xP {H{g{u) ~ g{u)) = 0) 
(^) J_ 

Where (a) follows since c is uniform and independent of H 
and [h) follows since H and c are uniform and g{u) — g{u) 
is nonzero. ■ 
For a source sequence x G A"", the encoder declare error if 
there is no sequence u G Ao jointly typical with x. Define 



(.U\x)} 



■ueAa 



Let Z he Si uniform random variable over 7(Zp — ^^)) and 
a uniform random variable over S'. We need the following 
lemmas to proceed: 

Lemma IV.3. With the above construction |Ao| = p"~' with 
high probability. Specifically, 



P {rank{H) = I) = 
1 

> 1 - 



(39" - i)(p" - p)(p" - p^) • • • b" - p'"^) 



„nl 



,n — l 



P 

and hence the probability that \Ao\ = is close to one if 
n is large. Furthermore, for i = 1, 2, • • • , Z, 



P [rank{H) ^ i) < 



l\ 



P 

I J p' 



Proof: The first part of the lemma follows since the total 
number of choices for H is equal to p"' and the number 
of choices with independent rows is equal to (p" — l)(p" — 
p)(p" — p^) ■ ■ ■ (p" — p'^^). Now we show the upper bounds. 
For a matrix H to have a rank i, there should exist i indepen- 
dent rows and the rest of the rows must be a linear combination 
of these rows (There are p' of such linear combinations). 
Hence the total number of such matrices is upper bounded 
by 



The lemma follows if we upper bound this quantity by 
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Lemma IV.4. With 6{x) and defined as above, we have Similarly, 

2' 



nO{x)] < p-'P [Z- e A:iU\s) ) + E{0ix)} = J2 Pi\^o\ = r) E{0(x)||Ao| = r} 



1 ^ n-l-^ 

Proof: Write the random lattice Aq as > (1 — fz" 6 A"'{U\ 

{mi(Ao), M2(Ao), • • • ,itr(Ao)} where r is the cardinality -P" ^ 

of Ao and iti(Ao), it2(Ao), • • • ,Ur{Ao) are picked without 
replacement from A^. It follow from Lemma |IV. 1 1 that Therefore, 

given lAol = r = p"-\ wi(A„), W2(Ao), • • ■ , u.(Ao) Eief.u ^ j,«-'2-"[^(^&xll^z^^x)+0(.)] 

are each uniformly distributed random variables over i I P 

S'. To see this note that for arbitrary u G S", since Similarly, 
ui(Ao), U2(Ao), ■ • • , u, (Ao) are picked randomly from Aq, ^ 

^(^) '^{u.ueA'J'{U\x)} 
P{u^ ui(Ao)) = P(u^ U2{Ao)) = ■■■ = P{u = U,.(Ao)) «,neA„ 

Therefore = X/ -'^{«6Ay(c/|x)} + X! ^{«,fieA^'(';>|a:)} 

P{u^Ko)=Y.P{u = u,{ko)) < E + E 1kugA?(c/|.)} 

z— 1 taEAo u,uGAo 



It can be shown that 



E{0(x)2} = E{|A„|}P(Z" G A^(;7|a;) 



= rP(u = iti(Ao)) = ^ 
pi 

Hence if r = p"^' then ui(Ao) is uniform over 5'. This 
argument is valid for alH = 1, • ■ • , r and hence if r = p"^' 

then uAKo) is uniform over S' . Note that „ , 

^ ^ +E{|A„|}2p(z"G 

E{0(x)} = E{E{0(a;)||A„| = r}} ^ p„_Z2^„[D(p^, ||p.Px)+o(e)] 

The conditional expectation on the right hand side of this p2(ri-;)2-2n[-D(Pf>xll^z^A')+o(£)] 

equation is upper bounded by and for r = p"^' it is 

equal to Hence 



E{0(z)iiA„i = p"-'} = nJ2 ^{u^A^iu\.)}} 

"SAo Hence, 



Therefore if 

J2 P (MK) ^ K\U\x)) l_ 



1=1 

■ logp < logp - D{P^^\\PzPx) (23) 

/ - \ i\isvi the probability of encoding error goes to zero as the block 

^ Z_^P (^^" ^ ^^(^l^^) j length increases. 

'=1 2j Decoding Error: After observing m and the side in- 

^pii-ip ^^11 ^ A^^{U\x)^ formation s, the decoder declares error if it does not find a 

sequence in the bin S„i jointly typical with s or if there are 
Where (a) follows since Wi(Ao) is uniformly distributed over multiple of such sequences. We will show that the probability 
S" for all i = 1, • ■ • , r. Next note that that a sequence u 7^ u is in the same bin as u and is jointly 

typical with s goes to zero as the block length increases 

E{0(.t)} = y P(|A„| = r)E{0(x)||A<,| = r} ^logP > logp - D{P^s\\PzPs). The probability of 

decoding error is upper bounded by 

< P (|A„| = p"-') rP (Z" G A^(t7|x)) P_ < E P (^i G u G A^M^) 

+ ^ P (|Ao| = p"-^) p"-' ^ ^ p e p (^^n ^ 



j^n(/ 1) p, 
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Hence the probability of decoding error goes to zero if 

^ \ogp > \ogp - D{P^g\\PzPs) (24) 

3) The Achievable Rate: Using (|24] | and (|24] |. we con- 
clude that if we choose ^ log p sufficiently close to log p — 
D{Pjj-^\\PzPx) and ^logp sufficiently close to logp — 
D{Pjjg\\PzPs) we can achieve the rate 

k 

R = — logp 
n 

« D{P^^\\PzPx) - D{P^^\\PzPs) 
= I{X;U)-I{S;U) 

V. Conclusion 

We have shown that nested lattice codes are optimal for the 
Gelfand-Pinsker problem as well as the Wyner-Ziv problem. 

VI. Appendix 

A. Proof of Lemma \in.4\ 

The proof follows along the lines of the proof of Theorem 
21 of HSl. Let Q = {Ai,A2,--- ,Ar} be a finite partition 
of R. Let QxYZ, Qxy, Qxz, Qyz, Qx, Qy and Qz be 
measures induced by this partition, corresponding to Pxyz, 
PxY, Pxz, Pyz, Px, Py and Pz respectively. For the 
random sequence = {Zi,--- , Z„) and the deterministic 
sequence y = (yi, • • • ,yn) let Qy be the deterministic empir- 
ical measure of y and define the random empirical measures 

1 " 

Qzy{Ai,A.j) = - ^t{z.eA,.,y,eAj} 
1=1 

1 " 

i=l 

for i,j — 1,2,--- ,r. As a property of weakly* typical 
sequences, for a fixed ei > 0, there exists a sufficiently small 
e > such that for a sequence pair {x, y) G A^{XY) and for 
all i,j = 1,2, • • • ,r, 

\Q,j:y{A„ Aj) ~ QxYiA,, Aj)\ < ei 

where Q^y is the joint empirical measure of {x, y). It follows 
that the rare event {Z'"-,y) G A'2{XY) is included in the 
intersection of events 

{\QzyiA,,A,) - Qxy{A^,Aj)\ <ei} (25) 

for i, j = 1,2, ■ ■ ■ ,r. Therefore 

Q'l((Z",y)GA';(Xr))< 

Qzi n {\Qzy{A„A,)-QxY{A,A,)\<ei} 

Let e{6) be such that for j ~ 1, ■ ■ ■ ,r, 

\Qy{Aj)^QY{Aj)\ <ei 



1-ei < 



(^3 ) 
}y{A,) 



< 1 + ei 



Note that if Qy{Aj) = then Qxy{A^, A^) ==0 and hence 

\Qzy{A„A^j) - QxY{A^,AJ)\ ^ Qzy{A„Aj) 

< Qy{Aj) < £1 

and dZSl ) is satisfied. If we choose ei smaller than any nonzero 
QYiAj) it follows that Qy{Aj) > whenever Qy{Aj) > 0. 
Now assume that QyiAj) > and hence Qy{Aj) > 0. Define 



Qx\YiAi\Aj) 

Qz\yiA\A,) 



IXY 



(A,, A,) 



QyIA,) 

Qzy{A„A.j) 

Qy{Aj) 



If QviAj) > 0, the event in dZST l is included in the event 

{\Qz\yiA,\Aj)QyiA,) ~ Qx\YiA\A,)Qy{Aj) 

+Qx\Y{A,\Aj)Qy{Aj) - Qx\Y{AMj)QY{Aj)\<ei} 

(26) 

Note that 

\Qx\YiAMj)QyiAj) - Qx\YiAMj)QY{A,)\ 

= Qx\y{A,\Aj) \Qy{Aj) - Qy{Aj)\ 
< ei 



Therefore Goi implies 

{\Qz\y{A\Aj)QyiA,)^Qx\YiA\A,)\QyiA,) < 2ei} 
And this implies 

2ei 



{\Qz\yiA\A,)QyiA,) ~ Qx\y(.A\Aj)\<- 



} 



Let 



£2 



2ei 



3=1 QyiAj){l-ei) 

Qy{A,)>0 



then the event in (l25l l is included in the event 

{\Qz\y{A^\Aj)Qy{AJ} - Qx\Y{A^\AJ)\ < £2 

Therefore 

01((z",2/)G^^(xy))< 
/ 



01 



n {\Qz\y{AM,) - QxIy{AM,)\ < £2} 



i,j=l 
\QY{Aj)>0 



) 



Note that since y is a deterministic sequence and Z^'s are iid, 
the events 

{\Qz\y{A,\A,) ~ Qx\Y{.A,\Aj)\ < ea} 

are independent for different values of j = 1, • • ■ ,r. Let Uj = 
nQy{Aj). Then, 

Ql((z",2/)GA:'(xr))< 

r / r \ 

n n {\Qz\y{A\Aj)~QxiY{AMj)\ < £2} 

j=l \i=l J 

QYiAj)>0 
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Since for Qy(Aj) > 0, rij — > oo as n oo, it follows from 
Sanov's theorem ll22l that 

lim sup — log 

n— >oo Uj 

<-[DiQx\Yi-\A,)\\Qz{-))-5,] 
where 6j as €2 — !• 0. Therefore 

lim sup - logg^l ((Z",y) e A^{XY)) 

n— >oo n 
r 

< V lim sup^D{Qxiy{-\A,)\\Qz{-)) 

J = l 
Qy{A,)>0 

r 

< -{l-ei)QY{A,)[D{QxiY{-\AMQz{-))-Sj] 

Qy(Aj)>0 

< -(l-ei)i?(gxy||gzgy) + '5' 

where (5' ^ as £2 ^ 0. For finite D{Pxy\\PzPy) the 
statement of the lemma follows by choosing the quantiza- 
tion Q such that D{Qxy\\QzQy) is sufficiently close to 
D{Pxy\\PzPy)- 

B. Proof of Lemma I///. 51 

The proof follows along the lines of the proof of Theorem 
22 of ||T8l| . Let Q = {yli,yl2,--- be a finite partition 

of R. Let QxYZ, Qxy, Qxz, Qyz, Qx, Qy and Qz be 
measures induced by this partition, corresponding to Pxyz, 
PxY, Pxz, Pyz, Px, Py and Pz respectively. For the 
random sequence = {Zi,--- , Z„) and the deterministic 
sequence y = (j/i, • • • ,yn) let Qy be the deterministic empir- 
ical measure of y and define the random empirical measures 

1 " 

QzyiA^Aj) = - ^l{Z,eA.,y.GAj} 
i=l 

1 " 

QziA,) = -^1{Z.GA,} 
i=l 

For arbitrary S > Q, let Q be such that 

'^iQxY,PxY) < e 
Tr{QzY,PzY) < e 

We show that for such a quantization, under certain conditions, 
the probability of the event 

{'^iQzy,QxY) < e} 

is close to the probability of the event 

{7r{Pzy,PxY) < 5e} 

It follows from Theorem 18 of ([iSl that for arbitrary e,6' > 0, 
there exists some e > such that for all n greater than some 



N if y & A^{Y), then 

lim P{7r{Pzy,PzY) <e) > l-<5 
lim P U{Qzy,QzY) < e) > 1 - (5 

n— ^cxD 

Consider the event 

{T^iQzy^QxY) < <^,T^{Pzy,PzY) < T^iQ Zy , Q Zy) < e} 

This event implies 

APzy, Pxy) < 7r{Pzy,PzY) + 7r{QzY,PzY) 
+ T^iQzy,QzY) + T^iQzy, Qxy) 

+ tt{Qxy,Pxy) < 5e 

Therefore 

P{7:{Pzy,PxY) <5e) > 

P {T^{Qzy,QxY) < e,Tr{Pzy, Pzy) < ^,Qzy,QzY) < e) 

The right hand side can be lower bounded by 

I- P{TTiQzy,QxY)>e) (27) 
- P {n{Pzy,PzY) >e)-P (Qzy^QzY) > e) (28) 
>P{TT{Qzy,QxY)<e)-6-6 (29) 

Note that for arbitrary S' and for sufficiently large n, 

PHQzy,QxY)) > 2-"[^W-IIQ-Q-)+5'] 

Since 6,6' are arbitrary and D{Qxy\\QzQy) ~ 
D{Pxy\\PzPy), it follows that 

P{'K{Pzy,PxY) < 5e) > 2-"[-D(-Pxv||PzPr)+A-+e'] _ 3^ 
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